Hanson–Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data

نویسندگان

چکیده

We derive a dimension-free Hanson–Wright inequality for quadratic forms of independent sub-gaussian random variables in separable Hilbert space. Our is an infinite-dimensional generalization the classical finite-dimensional Euclidean vectors. illustrate application to generalized $K$-means clustering problem non-Euclidean data. Specifically, we establish exponential rate convergence semidefinite relaxation $K$-means, which together with simple rounding algorithm imply exact recovery true structure.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

Parallel K-Means Clustering with Triangle Inequality

Clustering divides data objects into groups to minimize the variation within each group. This technique is widely used in data mining and other areas of computer science. K-means is a partitional clustering algorithm that produces a fixed number of clusters through an iterative process. The relative simplicity and obvious data parallelism of the K-means algorithm make it an excellent candidate ...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Clustering Stable Instances of Euclidean k-means

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd’s algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to desi...

متن کامل

Spatial Analysis in curved spaces with Non-Euclidean Geometry

The ultimate goal of spatial information, both as part of technology and as science, is to answer questions and issues related to space, place, and location. Therefore, geometry is widely used for description, storage, and analysis. Undoubtedly, one of the most essential features of spatial information is geometric features, and one of the most obvious types of analysis is the geometric type an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bernoulli

سال: 2021

ISSN: ['1573-9759', '1350-7265']

DOI: https://doi.org/10.3150/20-bej1251